35 research outputs found
Recommended from our members
Efficient spiking neural network model of pattern motion selectivity in visual cortex
Simulating large-scale models of biological motion perception is challenging, due to the required memory to store the network structure and the computational power needed to quickly solve the neuronal dynamics. A low-cost yet high-performance approach to simulating large-scale neural network models in real-time is to leverage the parallel processing capability of graphics processing units (GPUs). Based on this approach, we present a two-stage model of visual area MT that we believe to be the first large-scale spiking network to demonstrate pattern direction selectivity. In this model, component-direction- selective (CDS) cells in MT linearly combine inputs from V1 cells that have spatiotemporal receptive fields according to the motion energy model of Simoncelli and Heeger. Pattern-direction-selective (PDS) cells in MT are constructed by pooling over MT CDS cells with a wide range of preferred directions. Responses of our model neurons are comparable to electrophysiological results for grating and plaid stimuli as well as speed tuning. The behavioral response of the network in a motion discrimination task is in agreement with psychophysical data. Moreover, our implementation outperforms a previous implementation of the motion energy model by orders of magnitude in terms of computational speed and memory usage. The full network, which comprises 153,216 neurons and approximately 40 million synapses, processes 20 frames per second of a 40∈×∈40 input video in real-time using a single off-the-shelf GPU. To promote the use of this algorithm among neuroscientists and computer vision researchers, the source code for the simulator, the network, and analysis scripts are publicly available. © 2014 Springer Science+Business Media New York
Energy-Efficient Instruction Set Synthesis for Application-Specific Processors
Several techniques have been proposed to enhance the energy-efficiency of ASIPs (Application-Specific Instruction set Processors). While those techniques can reduce the energy consumption with a minimal change in the instruction set (IS), they fail to exploit the opportunity of designing the entire IS from the energy-efficiency perspective. In this paper, we present an energy-efficient IS synthesis technique that can comprehensively reduce the energy-delay product (EDP) of ASIPs through optimal instruction encoding, considering both the instruction bitwidth and the dynamic instruction count. Experimental results with a typical embedded RISC processor show that our technique can generate application-specific IS's that are up to 40% more energy-efficient over the native IS for several application benchmarks
Compilation approach for coarse-grained reconfigurable architectures
Coarse-grained reconfigurable architectures can enhance the performance of critical loops and computation-intensive functions. Such architectures need efficient compilation techniques to map algorithms onto customized architectural configurations. A new compilation approach uses a generic reconfigurable architecture to tackle the memory bottleneck that typically limits the performance of many applications.close396
Evaluating memory architectures for media applications on coarse-grained reconfigurable architectures
Reconfigurable ALU Array (RAA) architectures - representing a popular class of Coarse-grained Reconfigurable Architectures - are gaining in popularity especially for media applications due to their flexibility, regularity, and efficiency. In such architectures, memory is critical not only for configuration data but also for the heavy data traffic required by the application. In this paper, we offer a scheme for system designers to quickly estimate the performance of media applications on RAA architectures. Our experimental results demonstrate the flexibility of our memory architecture evaluation scheme as well as the varying effects of the memory architectures on the application performance.close0
Exploiting Heterogeneous Mobile Architectures Through a Unified Runtime Framework
International audienceModern mobile SoCs are typically integrated with multiple heterogeneous hardware accelerators such as GPU and DSP. Resource heavy applications such as object detection and image recognition based on convolutional neural networks are accelerated by offloading these computation-intensive algorithms to the accelerators to meet their stringent performance constraints. Conventionally there are device-specific runtime and programming languages supported for programming each accelerator, and these offloading tasks are typically pre-mapped to a specific compute unit at compile time, missing the opportunity to exploit other underutilized compute resources to gain better performance. To address this shortcoming, we present SURF: a Self-aware Unified Runtime Framework for Parallel Programs on Heterogeneous Mobile Architectures. SURF supports several heterogeneous parallel programming languages (including OpenMP and OpenCL), and enables dynamic task-mapping to heterogeneous resources based on runtime measurement and prediction. The measurement and monitoring loop enables self-aware adaptation of run-time mapping to exploit the best available resource dynamically. Our SURF framework has been implemented on a Qualcomm Snapdragon 835 development board and evaluated on a mix of image recognition (CNN), image filtering applications and synthetic benchmarks to demonstrate the versatility and efficacy of our unified runtime framework
Test-case generation for embedded simulink via formal concept analysis
Mutation testing suffers from the high computational cost of automated test-vector generation, due to the large number of mutants that can be derived from programs and the cost of generating test-cases in a white-box manner. We propose a novel algorithm for mutation-based test-case generation for Simulink models that combines white-box testing with formal concept analysis. By exploiting similarity measures on mutants, we are able to effectively generate small sets of short test-cases that achieve high coverage on a collection of Simulink models from the automotive domain. Experiments show that our algorithm performs significantly better than random testing or simpler mutation-testing approaches
Recommended from our members
Mapping Spiking Neural Networks to Neuromorphic Hardware
Neuromorphic hardware implements biological neurons and synapses to execute a spiking neural network (SNN)-based machine learning. We present SpiNeMap, a design methodology to map SNNs to crossbar-based neuromorphic hardware, minimizing spike latency and energy consumption. SpiNeMap operates in two steps: SpiNeCluster and SpiNePlacer. SpiNeCluster is a heuristic-based clustering technique to partition an SNN into clusters of synapses, where intracluster local synapses are mapped within crossbars of the hardware and intercluster global synapses are mapped to the shared interconnect. SpiNeCluster minimizes the number of spikes on global synapses, which reduces spike congestion and improves application performance. SpiNePlacer then finds the best placement of local and global synapses on the hardware using a metaheuristic-based approach to minimize energy consumption and spike latency. We evaluate SpiNeMap using synthetic and realistic SNNs on a state-of-the-art neuromorphic hardware. We show that SpiNeMap reduces average energy consumption by 45% and spike latency by 21%, compared to the best-performing SNN mapping technique
Unsupervised heart-rate estimation in wearables with liquid states and a probabilistic readout
\u3cp\u3eHeart-rate estimation is a fundamental feature of modern wearable devices. In this paper we propose a machine learning technique to estimate heart-rate from electrocardiogram (ECG) data collected using wearable devices. The novelty of our approach lies in (1) encoding spatio-temporal properties of ECG signals directly into spike train and using this to excite recurrently connected spiking neurons in a Liquid State Machine computation model; (2) a novel learning algorithm; and (3) an intelligently designed unsupervised readout based on Fuzzy c-Means clustering of spike responses from a subset of neurons (Liquid states), selected using particle swarm optimization. Our approach differs from existing works by learning directly from ECG signals (allowing personalization), without requiring costly data annotations. Additionally, our approach can be easily implemented on state-of-the-art spiking-based neuromorphic systems, offering high accuracy, yet significantly low energy footprint, leading to an extended battery-life of wearable devices. We validated our approach with CARLsim, a GPU accelerated spiking neural network simulator modeling Izhikevich spiking neurons with Spike Timing Dependent Plasticity (STDP) and homeostatic scaling. A range of subjects is considered from in-house clinical trials and public ECG databases. Results show high accuracy and low energy footprint in heart-rate estimation across subjects with and without cardiac irregularities, signifying the strong potential of this approach to be integrated in future wearable devices.\u3c/p\u3